High-order Deep Neural Networks for Learning Multi-Modal Representations
نویسندگان
چکیده
In multi-modal learning, data consists of multiple modalities, which need to be represented jointly to capture the real-world ’concept’ that the data corresponds to (Srivastava & Salakhutdinov, 2012). However, it is not easy to obtain the joint representations reflecting the structure of multi-modal data with machine learning algorithms, especially with conventional neural networks. This is because the information which consists of multiple modalities has distinct statistical properties and each modality has a different kind of representation and correlational structure (Srivastava & Salakhutdinov, 2012). Also, noise exists in information from multi-modal input, which makes the information unreliable and inaccurate (Ernst & Di Luca, 2011).
منابع مشابه
Sensory Cue Integration with High-Order Deep Neural Networks
Humans can easily capture real world concepts from multi-modal signals by constructing joint representations of these signals. The joint representations may contain abstract information of multiple modalities and relationships across the modalities. Contrary to humans, it is not easy to obtain joint representations reflecting the structure of multimodal data with machine learning algorithms, es...
متن کاملLearning Neural Audio Embeddings for Grounding Semantics in Auditory Perception
Multi-modal semantics, which aims to ground semantic representations in perception, has relied on feature norms or raw image data for perceptual input. In this paper we examine grounding semantic representations in raw auditory data, using standard evaluations for multi-modal semantics. After having shown the quality of such auditorily grounded representations, we show how they can be applied t...
متن کاملMulti-modal Face Pose Estimation with Multi-task Manifold Deep Learning
Human face pose estimation aims at estimating the gazing direction or head postures with 2D images. It gives some very important information such as communicative gestures, saliency detection and so on, which attracts plenty of attention recently. However, it is challenging because of complex background, various orientations and face appearance visibility. Therefore, a descriptive representatio...
متن کاملA multi-scale convolutional neural network for automatic cloud and cloud shadow detection from Gaofen-1 images
The reconstruction of the information contaminated by cloud and cloud shadow is an important step in pre-processing of high-resolution satellite images. The cloud and cloud shadow automatic segmentation could be the first step in the process of reconstructing the information contaminated by cloud and cloud shadow. This stage is a remarkable challenge due to the relatively inefficient performanc...
متن کاملDeep embodiment: grounding semantics in perceptual modalities
Multi-modal distributional semantic models address the fact that text-based semantic models, which represent word meanings as a distribution over other words, suffer from the grounding problem. This thesis advances the field of multi-modal semantics in two directions. First, it shows that transferred convolutional neural network representations outperform the traditional bag of visual words met...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016